Genomic selection for genotype performance and stability using information on multiple traits and multiple environments

Key message The inclusion of multiple traits and multiple environments within a partially separable factor analytic approach for genomic selection provides breeders with an informative framework to utilise genotype by environment by trait interaction for efficient selection. Abstract This paper develops a single-stage genomic selection (GS) approach which incorporates information on multiple traits and multiple environments within a partially separable factor analytic framework. The factor analytic linear mixed model is an effective method for analysing multi-environment trial (MET) datasets, but has not been extended to GS for multiple traits and multiple environments. The advantage of using all information is that breeders can utilise genotype by environment by trait interaction (GETI) to obtain more accurate predictions across correlated traits and environments. The partially separable factor analytic linear mixed model (SFA-LMM) developed in this paper is based on a three-way separable structure, which includes a factor analytic matrix between traits, a factor analytic matrix between environments and a genomic relationship matrix between genotypes. A diagonal matrix is then added to enable a different genotype by environment interaction (GEI) pattern for each trait and a different genotype by trait interaction (GTI) pattern for each environment. The results show that the SFA-LMM provides a better fit than separable approaches and a comparable fit to non-separable and partially separable approaches. The distinguishing feature of the SFA-LMM is that it will include fewer parameters than all other approaches as the number of genotypes, traits and environments increases. Lastly, a selection index is used to demonstrate simultaneous selection for overall performance and stability. This research represents an important continuation in the advancement of plant breeding analyses, particularly with the advent of high-throughput datasets involving a very large number of genotypes, traits and environments. Supplementary Information The online version contains supplementary material available at 10.1007/s00122-023-04305-1.


Sparse implementation of the SFA-LMM
This section extends the sparse implementation of the average information (AI) algorithm for the partially separable factor analytic linear mixed model (SFA-LMM). Thompson et al. (2003) introduced the sparse implementation for factor analytic linear mixed models as a more efficient approach to estimate the key variance parameters, that is the loadings, score variances and specific variances. This approach also provides a natural way to handle cases where some specific variances are zero. When all specific variances are zero, this leads to a (fully) reduced rank factor analytic linear mixed model (Kirkpatrick and Meyer, 2004).

Preliminaries
The SFA-LMM for y, the ns-vector of phenotypic data on v genotypes, p environments and s traits, is: where is the n × vk t k e separable factor analytic design matrix, and ε = (I s ⊗ Z)u n + (I s ⊗ Z p )u p + e, with var(ε) = R ε . The SFA-LMM in Equation 1 can be extended by reordering and partitioning the regression residuals as δ te = (δ ⊤ te1 , δ ⊤ te2 ) ⊤ , where δ te1 is a vp 1 s 1 -vector with no zero elements and δ te2 is a v(p 1 s 2 + p 2 s)-vector with all zero elements, such that p = p 1 + p 2 and s = s 1 + s 2 , with p 1 ≥ 1 due to the constraints required during estimation (see Section X of the manuscript). Two simpler models can also be obtained: 1. When p 1 = p and s 1 = s, no specific variances are zero. This model is the conventional SFA-LMM. 2. When s 1 = 0, all specific variances are effectively zero. This model is referred to as the reduced rank SFA-LMM. The model considered below allows some specific variances to be non-zero and some to be zero. The SFA-LMM can therefore be written as: where [Z 1 Z 2 ] is partitioned conformably with δ te . It is assumed that δ te2 = 0, and that: ) .
In this model, the genotype scores and regression residuals are assumed to be independent. Thompson et al. (2003) also present an equivalent formulation where the random effects are assumed to be dependent. This formulation will be extended for the SFA-LMM in a subsequent paper. The mixed model equations for the SFA-LMM in Equation 2 are given by: Prediction of genotype scores and regression residuals Following Smith et al. (2019), Equations 2 and 3 can be written as: The BLUPs of the key random effects are obtained via absorption of C onto y ⊤ R −1 ε y, which gives: is the ns × ns residual sum of squares matrix. Absorption of C also produces the prediction error variance matrices, with: . These matrices are equivalent to the diagonal blocks in C −1 corresponding tof te andδ te1 , respectively.
Lastly, note that the components in the equations above assume the variance parameters are known. The variance parameters are unknown, however, so they must be estimated. The resulting predictions are therefore referred to as EBLUPs.

Estimation of loadings and specific variances
The REML estimates of the key variance parameters are obtained by maximising the residual log-likelihood, which is given by: where y 2 = L ⊤ 2 y such that L ⊤ 2 X = 0 and y r = y − Xτ , with var(y r ) = H (Verbyla, 1990). In particular, the REML estimates are obtained by solving a set of (score) equations, which are given by: where κ is the b-vector of variance parameters in the IFA-LMM. The score equations were originally solved using numerical approaches based on the observed or expected information. The AI algorithm is based on the average information, and computes updates as: where κ (m) is the b-vector of variance parameters, I (m) a is the AI matrix and s(κ (m) ) is the score equations for the m th iteration. Note that the superscript "m" is removed in the following for brevity. The score equation for κ i is given by (Gilmour et al., 1995): whereḢ i = ∂H ∂κi and q i is the working variate for the i th variance parameter, which is given by: Next, let the vectors of key variance parameters in the SFA-LMM be denoted by: where a = t and e for traits and environments. The working variates for the key variance parameters are given by: with a = t or e. The working variates for the specific variances can be simplified since the regression residuals are independent across trait by environment combinations. It therefore follows that: where Z 1i is the n 1 × vp 1 design matrix for the i th trait with n 1 = ∑ p1 i=1 n i or Z 1i is the n i s 1 ×vs 1 design matrix for the i th environment, P i· are the corresponding n 1 or n i s 1 rows in P and Z 1·i are the corresponding vp 1 or vs 1 columns in Z 1 .
The trace term in Equation 8 is then given by: where C ·f te is the column in the inverse coefficient matrix corresponding tof te and C ·δte 1 is the column corresponding toδ te1 .
The trace terms for the specific variances can be further simplified as: where R εi is the i th diagonal block in R ε , W i· are the n 1 or n i s 1 rows in W corresponding to the i th trait or environment and C ·δte 1 i are the vp 1 or vs 1 columns in C corresponding toδ te1 i . The trace terms avoid working with the dense genomic relationship matrix. When G g is not prohibitively large, the trace term for the specific variances can also be computed as: where C η1 i η1 i is the prediction error variance matrix of η 1i = Z ⊤ 1i P i· y. Lastly, the AI matrix in Equation 7 is given by: where Q = [q 1 q 2 . . . q b ] is the ns × b matrix of working variates across all b variance parameters. The AI matrix is obtained via absorption of C onto Q ⊤ R −1 ε Q (Smith, 1999).